home *** CD-ROM | disk | FTP | other *** search
- Article 441 of comp.protocols.tcp-ip:
- From: hedrick@topaz.rutgers.edu (Charles Hedrick)
- Subject: TCP/IP introduction
- Message-ID: <12979@topaz.rutgers.edu>
- Date: 28 Jun 87 07:52:42 GMT
-
-
- I keep seeing requests on various newsgroups for an introduction to
- TCP/IP. I also get such requests locally. I believe that the only
- appropriate description of TCP/IP is the RFC's. However I also think
- a brief introduction is likely to be helpful before plowing right
- into them. The following document is an attempt to do that. It also
- recommends some RFC's to look at and tells you how to get them.
-
- --------------------------------
-
- This document is a brief introduction to TCP/IP, followed by advice on
- what to read for more information. This is not intended to be a
- complete description, but merely enough of an introduction to allow
- you to start reading the RFC's. At the end of the document there will
- be a list of the RFC's that we recommend reading.
-
- TCP/IP is a set of protocols developed to allow cooperating computers
- to share resources across a network. It was developed by a community
- of researchers centered around the ARPAnet. Certainly the ARPAnet is
- the best-known TCP/IP network. However as of June, 87, at least 130
- different vendors had products that support TCP/IP, and thousands of
- networks of all kinds use it.
-
- First some basic definitions. Although TCP/IP (or IP/TCP) seems to be
- the most common term these days, most of the documentation refers to
- the "Internet protocols". The Internet is a collection of networks,
- including the Arpanet, NSFnet, regional networks such as NYsernet,
- local networks at a number of University and research institutions,
- and a number of military networks. The term "Internet" applies to
- this entire set of networks. The subset of them which is managed by
- the Department of Defense is referred to as the "DDN" (Defense Data
- Network). This includes some research-oriented networks, such as the
- Arpanet, as well as more strictly military ones. (Because much of the
- funding for Internet protocol developments is done via the DDN
- organization, the terms Internet and DDN can sometimes seem
- equivalent.) All of these networks are connected to each other, and
- users can send messages from any of them to any other (except where
- security or other policy restrictions control access). Officially
- speaking, the Internet protocol documents are simply standards adopted
- by the Internet community for its own use. More recently, the
- Department of Defense issued a MILSPEC definition of TCP/IP. This was
- intended to be a more formal definition, appropriate for use in
- purchasing specifications. However most of the TCP/IP community
- continues to use the Internet standards. The MILSPEC version is
- intended to be consistent with it.
-
- Whatever it is called, TCP/IP is a family of protocols. A few are
- basic ones used for many applications. These include IP, TCP, and
- UDP. Others are protocols for doing specific tasks, e.g. transferring
- files between computers, sending mail, or finding out who is logged in
- on another computer. Any real application will use several of these
- protocols. A typical situation is sending mail. First, there is a
- protocol for mail. This defines a set of commands which one machine
- sends to another, e.g. commands to specify who the sender of the
- message is, who it is being sent to, and then the text of the message.
- However this protocol assumes that there is a way to communicate
- reliably between the two computers. Mail, like other application
- protocols, simply defines a set of commands and messages to be sent.
- It is designed to be used together with TCP and IP. TCP is responsible
- for making sure that the commands get through to the other end. It
- keeps track of what is sent, and retransmitts anything that did not
- get through. If any message is too large for one packet, e.g. the
- text of the mail, TCP will split it up into several packets, and make
- sure that they all arrive correctly. Since these functions are needed
- for many applications, they are put together into a separate protocol,
- rather than being part of the specifications for sending mail. You
- can think of TCP as forming a library of routines that applications
- can use when they need reliable network communications with another
- computer. Similarly, TCP calls on the services of IP. Although the
- services that TCP supplies are needed by many applications, there are
- still some kinds of applications that don't need them. However there
- are some services that every application needs. So these services are
- put together into IP. As with TCP, you can think of IP as a library
- of routines that TCP calls on, but which is also available to
- applications that don't use TCP. This strategy of building several
- levels of protocol is called "layering". We think of the applications
- programs such as mail, TCP, and IP, as being separate "layers", each
- of which calls on the services of the layer below it. Generally,
- TCP/IP applications use 4 layers:
-
- - an application protocol such as mail
- - a protocol such as TCP that provides services need by many applications
- - IP, which provides the basic service of getting packets to their
- destination
- - the protocols needed to manage a specific physical medium, such as
- Ethernet or a point to point line.
-
- TCP/IP is based on the "catenet model". (This is described in more
- detail in ien-48.txt.) This model assumes that there are a large
- number of independent networks connected together by gateways. The
- user should be able to access computers or other resources on any of
- these networks. Packets will often pass through a dozen different
- networks before getting to their final destination. The routing
- needed to accomplish this should be completely invisible to the user.
- As far as the user is concerned, all he needs to know in order to
- access another system is an "Internet address". This is an address
- that looks like 128.6.4.194. It is actually a 32-bit number. However
- it is normally written as 4 decimal numbers, each representing 8 bits
- of the address. (The term "octet" is used by Internet documentation
- for such 8-bit chunks. The term "byte" is not used, because TCP/IP is
- supported by some computers that have byte sizes other than 8 bits.)
- Generally the structure of the address gives you some information
- about how to get to the system. For example, 128.6 is a network
- number assigned by a central authority to Rutgers University. Rutgers
- uses the next octet to indicate which of the campus Ethernets is
- involved. 128.6.4 happens to be an Ethernet used by the Computer
- Science Department. The last octet allows for up to 254 systems on
- each Ethernet. Note that 128.6.4.194 and 128.6.5.194 would be
- different systems. (The structure of an Internet address is described
- in a bit more detail later.)
-
- Of course we normally refer to systems by name, rather than by
- Internet address. When we specify a name, the network software looks
- it up in a database, and comes up with the corresponding Internet
- address. Most of the network software deals strictly in terms of the
- address. (rfc-882.txt describes the database used to look up names.)
-
- TCP/IP is a "connectionless" protocol. Information is transfered in
- "packets". Each of these packets is sent through the network
- individually. There are provisions to open connections to systems.
- However at some level, information is put into packets, and those
- packets are treated by the network as completely separate. For
- example, suppose you want to transfer a 15000 octet file. Most
- networks can't handle a 15000 octet packet. So the protocols will
- break this up into something like 30 500-octet packets. Each of these
- packets will be sent to the other end. At that point, they will be
- put back together into the 15000-octet file. However while those
- packets are in transit, the network doesn't know that there is any
- connection between them. It is perfectly possible that packet 14 will
- actually arrive before packet 13. It is also possible that somewhere
- in the network, an error will occur, and a packet won't get through
- at all. In that case, that packet has to be sent again. In fact,
- there are two separate protocols involved in doing this. TCP (the
- "transmission control protocol") is responsible for breaking up the
- message into packets, reassembling them at the other end, resending
- anything that gets lost, and putting things back in the right order.
- IP (the "internet protocol") is responsible for routing individual
- packets. It may seem like TCP is doing all the work. And in small
- networks that is true. However in the Internet, simply getting a
- packet to its destination can be a complex job. A connection may
- require the packet to go through several networks at Rutgers, a serial
- line to the John von Neuman Supercomputer Center, a couple of
- Ethernets there, a series of 56Kbaud phone lines to another NSFnet
- site, and more Ethernets on another campus. Keeping track of the
- routes to all of the destinations and handling incompatibilities among
- different transport media turns out to be a complex job. Note that
- the interface between TCP and IP is fairly simple. TCP simply hands
- IP a packet with a destination. IP doesn't know how this packet
- relates to any packet before it or after it.
-
- It may have occured to you that something is missing here. We have
- talked about Internet addresses, but not about how you keep track of
- multiple connections to a given system. Clearly it isn't enough to
- get a packet to the right destination. TCP has to know which
- connection this packet is part of. This task is referred to as
- "demultiplexing." In fact, there are several levels of demultiplexing
- going on in TCP/IP. The information needed to do this demultiplexing
- is contained in a series of "headers". A header is simply a few extra
- octets tacked onto the beginning of a packet by some protocol in order
- to keep track of it. It's a lot like putting a letter into an
- envelope and putting an address on the outside of the envelope.
- Except with modern networks it happens several times. It's like you
- put the letter into a little envelope, your secretary puts that into a
- somewhat bigger envelope, the campus mail center puts that envelope
- into a still bigger one, etc. Here is an overview of the headers that
- get stuck on a message that passes through a typical TCP/IP network:
-
- We start with a single data stream, say a file you are trying to
- send to some other computer:
-
- ......................................................
-
- TCP breaks it up into managable chunks. (In order to do this, TCP has
- to know how large a packet your network can handle. Actually, the
- TCP's at each end say how big a packet they can handle, and then they
- pick the smallest size.)
-
- .... .... .... .... .... .... .... ....
-
- TCP puts a header at the front of each packet. This header actually
- contains at least 20 octets, but the most important ones are a source
- and destination "port number" and a "sequence number". The port
- numbers are used to keep track of different conversations. Suppose 3
- different people are transferring files. Your TCP might allocate port
- numbers 1000, 1001, and 1002 to these transfers. When you are sending
- a packet, this becomes the "source" port number, since you are the
- source of the packet. Of course the TCP at the other end has assigned
- a port number of its own for the conversation. Your TCP has to know
- the port number used by the other end as well. (It finds out when the
- connection starts, as we will explain below.) It puts this in the
- "destination" port field. Of course if the other end sends a packet
- back to you, the source and destination port numbers will be reversed,
- since then it will be the source and you will be the destination.
- Each packet has a sequence number. This is used so that the other end
- can make sure that it gets the packets in the right order, and that it
- hasn't missed any. (See the TCP specification for details. TCP
- doesn't number the packets, but the octets. So if there are 500
- octets of data in each packet, the first packet might be numbered 0,
- the second 500, the next 1000, the next 1500, etc.) Finally, I will
- mention the Checksum. This is a number that is computed by adding up
- all the octets in the packet (more or less - see the TCP spec). The
- result is put in the header. TCP at the other end computes the
- checksum again. If they disagree, then something bad happened to the
- packet in transmission, and it is thrown away. So here's what the
- packet looks like now.
-
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Source Port | Destination Port |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Sequence Number |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | various other junk |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | various other junk |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Checksum | other junk |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | your data ... next 500 octets |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | ...... |
-
- If we abbreviate the TCP header as "T", the whole file now looks like this:
-
- T.... T.... T.... T.... T.... T.... T.... T....
-
- TCP now sends each of these packets to IP. Of course it has to tell
- IP the Internet address of the computer at the other end. Note that
- this is all IP is concerned about. It doesn't care about what is in
- the packet, or even in the TCP header. IP's job is simply to find a
- route for the packet and get it to the other end. In order to allow
- gateways or other intermediate systems to forward the packet, it adds
- its own header. The main things in this header are the source and
- destination Internet address (32-bit addresses, like 128.6.4.194), the
- protocol number, and another checksum. The source Internet address is
- simply the address of your machine. (This is necessary so the other
- end knows where the packet came from.) The destination Internet
- address is the address of the other machine. (This is necessary so
- any gateways in the middle know where you want the packet to go.) The
- protocol number tells IP at the other end to send the packet to TCP.
- Although most IP traffic uses TCP, there are other protocols that can
- use IP, so you have to tell IP which protocol to send the packet to.
- Finally, the checksum allows IP at the other end to verify that the
- packet wasn't damaged in transit. Note that TCP and IP have separate
- checksums. This is because IP doesn't know anything about TCP. As
- far as IP is concerned, everything after its header is just a bunch of
- bits. So IP computes a checksum of its own header, and IP at the
- other end checks it to make sure that the message didn't get damaged
- in transit. Once IP has tacked on its header, here's what the
- message looks like:
-
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | various other junk |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | various other junk |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | junk | Protocol | Header Checksum |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Source Address |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Destination Address |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | TCP header, then your data ......
-
- If we represent the IP header by an "I", your file now looks like this:
-
- IT.... IT.... IT.... IT.... IT.... IT.... IT.... IT....
-
- At this point, it's possible that no more headers are needed. If your
- computer happens to have a direct phone line connecting it to the
- destination computer, or to a gateway, it may simply send the packets
- out on the line (though likely a synchronous protocol such as HDLC
- would be used, and it would add at least a few octets at the beginning
- and end).
-
- However most of our networks these days use Ethernet. So now we have
- to describe Ethernet's headers. Unfortunately, Ethernet has its own
- addresses. The people who designed Ethernet wanted to make sure that
- no two machines would end up with the same Ethernet address.
- Furthermore, they didn't want the user to have to worry about
- assigning addresses. So each Ethernet controller comes with an
- address builtin from the factory. In order to make sure that they
- would never have to reuse addresses, the Ethernet designers allocated
- 48 bits for the Ethernet address. People who make Ethernet equipment
- have to register with a central authority, to make sure that the
- numbers they assign don't overlap any other manufacturer. Ethernet is
- a "broadcast medium". That is, it is in effect like an old party line
- telephone. When you send a packet out on the Ethernet, every machine
- on the network sees the packet. So something is needed to make sure
- that the right machine gets it. As you might guess, this involves the
- Ethernet header. Every Ethernet packet has a 14-octet header that
- includes the source and destination Ethernet address, and a type code.
- Each machine is supposed to pay attention only to packets with its own
- Ethernet address in the destination field. (It's perfectly possible
- to cheat, which is one reason that Ethernet communications are not
- terribly secure.) Note that there is no connection between the
- Ethernet address and the Internet address. Each machine has to have a
- table of what Ethernet address corresponds to what Internet address.
- (We will describe how this table is constructed a bit later.) In
- addition to the addresses, the header contains a type code. The type
- code is to allow for several different protocol families to be used on
- the same network. So you can use TCP/IP, DECnet, Xerox NS, etc. at
- the same time. Each of them will put a different value in the type
- field. Finally, there is a checksum. The Ethernet controller
- computes a checksum of the entire packet. When the other end receives
- the packet, it recomputes the checksum, and throws the packet away if
- the answer disagrees with the original. The checksum is put on the
- end of the packet, not in the header. The final result is that your
- message looks like this:
-
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Ethernet destination address (first 32 bits) |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Ethernet dest (last 16 bits) |Ethernet source (first 16 bits)|
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Ethernet source address (last 32 bits) |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Type code |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | IP header, then TCP header, then your data |
- | |
- ...
- | |
- | end of your data |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- | Ethernet Checksum |
- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-
- If we represent the Ethernet header with "E", and the Ethernet
- checksum with "C", your file now looks like this:
-
- EIT....C EIT....C EIT....C EIT....C EIT....C EIT....C
-
- When these packets are received by the other end, of course all the
- headers are removed. The Ethernet interface removes the Ethernet
- header and the checksum. It looks at the type code. Since the type
- code is the one assigned to IP, the Ethernet device driver passes the
- packet up to IP. IP removes the IP header. It looks at the IP
- protocol field. Since the protocol type is TCP, it passes the packet
- up to TCP. TCP now looks at the packet sequence number. It uses the
- sequence numbers and other information to combine all the packets into
- the original file.
-
- The ends our initial summary of TCP/IP. There are still some crucial
- concepts we haven't gotten to, so we'll now go back and add details in
- several areas. (For detailed descriptions of the items discussed here
- see, rfc793.txt for TCP, rfc791.txt for IP, and rfc894.txt and
- rfc826.txt for sending IP over Ethernet.)
-
-
- Well-known sockets and the applications layer
- =============================================
-
- So far, we have described how a stream of data is broken up into
- packets, sent to another computer, and put back together. However
- something more is needed in order to accomplish anything useful.
- There has to be a way for you to open a connection to a specified
- computer, log into it, tell it what file you want, and control the
- transmission of the file. (If you have a different application in
- mind, e.g. computer mail, some analogous protocol is needed.) This is
- done by "application protocols". The application protocols run "on
- top" of TCP/IP. That is, when they want to send a message, they give
- the message to TCP. TCP makes sure it gets delivered to the other
- end. Because TCP and IP take care of all the networking details, the
- applications protocols can treat a network connection as if it were a
- simple byte stream, like a terminal or phone line.
-
- Before going into more details about applications programs, we have to
- describe how you find an application. Suppose you want to send a file
- to a computer whose Internet address is 128.6.4.7. To start the
- process, you need more than just the Internet address. You have to
- connect to the file transfer server at the other end. In general,
- network programs are specialized for a specific set of tasks. Most
- systems have separate programs to handle file transfers, remote
- terminal logins, mail, etc. When you connect to 128.6.4.7, you have
- to specify that you want to talk to the file transfer program. This
- is done by having "well-known sockets" for each program. Recall that
- TCP uses port numbers to keep track of individual conversations. User
- programs normally use more or less random port numbers. However
- specific port numbers are assigned to the programs that sit waiting
- for requests. For example, if you want to send a file, you will start
- a program called "ftp". It will open a connection using some random
- number, say 1234, for the port number on its end. However it will
- specify port number 21 for the other end. This is the official port
- number for the ftp server. Note that there are two different programs
- involved. You run ftp on your side. This is a program designed to
- accept commands from your terminal and pass them on to the other end.
- The program that you talk to on the other machine is the ftp server.
- It is designed to accept commands from the network connection, rather
- than an interactive terminal. There is no need for your program to
- use a well-known socket number for itself. Nobody is trying to find
- it. However the servers have to have well-known numbers, so that
- people can open connections to them and start sending them commands.
- The official port numbers for each program are given in "Assigned
- Numbers".
-
- Note that a connection is actually described by a set of 4 numbers:
- the Internet address at each end, and the TCP port number at each end.
- Every packet has all four of those numbers in it. (The Internet
- addresses are in the IP header, and the TCP port numbers are in the
- TCP header.) In order to keep things straight, no two connections can
- have the same set of numbers. However it is enough for any one number
- to be different. For example, it is perfectly possible for two
- different users on a machine to be sending files to the same other
- machine. This could result in connections with the following
- parameters:
-
- Internet addresses TCP ports
- connection 1 128.6.4.194, 128.6.4.7 1234, 21
- connection 2 128.6.4.194, 128.6.4.7 1235, 21
-
- Since the same machines are involved, the Internet addresses are the
- same. Since they are both doing file transfers, one end of the
- connection involves the well-known port number for file transfer. The
- only thing that differs is the port number for the program that the
- users are running. That's enough of a difference. Generally, at
- least one end of the connection asks the network software to assign it
- a port number that is guaranteed to be unique. Normally, it's the
- user's end, since the server has to use a well-known number.
-
- Now that we know how to open connections, let's get back to the
- applications programs. As mentioned above, once TCP has opened a
- connection, we have something that might as well be a simple wire.
- All the hard parts are handled by TCP and IP. However we still need
- some agreement as to what we send over this connection. In effect
- this is simply an agreement on what set of commands the application
- will understand, and the format in which they are to be sent.
- Generally, what is sent is a combination of commands and data. They
- use context to differentiate. For example, the mail protocol works
- like this: Your mail program opens a connection to the mail server at
- the other end. Your program gives it your machine's name, the sender
- of the message, and the recipients you want it sent to. It then sends
- a command saying that it is starting the message. At that point, the
- other end stops treating what it sees as commands, and starts
- accepting the message. Your end then starts sending the text of the
- message. At the end of the message, a special mark is sent (a dot in
- the first column). After that, both ends understand that your program
- is again sending commands. This is the simplest way to do things, and
- the one that most applications use.
-
- File transfer is somewhat more complex. The file transfer protocol
- involves two different connections. It starts out just like mail.
- The user's program sends commands like "log me in as this user", "here
- is my password", "send me the file with this name". However once the
- command to send data is sent, a second connection is opened for the
- data itself. It would certainly be possible to send the data on the
- same connection, as mail does. However file transfers often take a
- long time. The designers of the file transfer protocol wanted to
- allow the user to continue issuing commands while the transfer is
- going on. For example, the user might make an inquiry, or he might
- abort the transfer. Thus the designers felt it was best to use a
- separate connection for the data and leave the original command
- connection for commands. (It is also possible to open command
- connections to two different computers, and tell them to send a file
- from one to the other. In that case, the data couldn't go over the
- command connection.)
-
- Remote terminal connections use another mechanism still. For remote
- logins, there is just one connection. It normally sends data. When
- it is necessary to send a command (e.g. to set the terminal type or to
- change some mode), a special character is used to indicate that the
- next character is a command. If the user happens to type that special
- character as data, two of them are sent.
-
- We are not going to describe the application protocols in detail in
- this document. It's better to read the RFC's yourself. However there
- are a couple of common conventions used by applications that will be
- described here. First, the common network representation: TCP/IP is
- intended to be usable on any computer. Unfortunately, not all
- computers agree on how data is represented. There are differences in
- character codes (ASCII vs. EBCDIC), in end of line conventions
- (carriage return, line feed, or a representation using counts), and in
- whether terminals expect characters to be sent individually or a line
- at a time. In order to allow computers of different kinds to
- communicate, each applications protocol defines a standard
- representation. Note that TCP and IP do not care about the
- representation. TCP simply sends octets. However the programs at
- both ends have to agree on how the octets are to be interpreted. The
- RFC for each application specifies the standard representation for
- that application. Normally it is "net ASCII". This uses ASCII
- characters, with end of line denoted by a carriage return followed by
- a line feed. For remote login, there is also a definition of a
- "standard terminal", which turns out to be a half-duplex terminal with
- echoing happening on the local machine. Most applications also make
- provisions for the two computers to agree on other representations
- that they may find more convenient. For example, PDP-10's have 36-bit
- words. There is a way that two PDP-10's can agree to send a 36-bit
- binary file. Similarly, two systems that prefer full-duplex terminal
- conversations can agree on that. However each application has a
- standard representation, which every machine must support.
-
- (For more details about the protocols mentioned in this section, see
- rfc821.txt and rfc822.txt for mail, rfc959.txt for file transfer, and
- rfc854.txt and rfc855.txt for remote logins. For the well-known port
- numbers, see the current edition of Assigned Numbers, and possible
- rfc814.txt.)
-
-
- Protocols other than TCP: UDP and ICMP
- ======================================
-
- So far, we have described only connections that use TCP. Recall that
- TCP is responsible for breaking up messages into packets, and
- reassembling them properly. However in many applications, we have
- messages that will always fit in a single packet. An example is name
- lookup. When a user attempts to make a connection to another system,
- he will generally specify the system by name, rather than Internet
- address. His system has to translate that name to an address before
- it can do anything. Generally, only a few systems have the database
- used to translate names to addresses. So the user's system will want
- to send a query to one of the systems that has the database. This
- query is going to be very short. It will certainly fit in one packet.
- So will the answer. Thus it seems silly to use TCP. Of course TCP
- does more than just break things up into packets. It also makes sure
- that the data arrives, resending packets where necessary. But for a
- question that fits in a single packet, we don't need all the
- complexity of TCP to do this. If we don't get an answer after a few
- seconds, we can just ask again. For applications like this, there are
- alternatives to TCP.
-
- The most common alternative is UDP ("user datagram protocol"). UDP is
- designed for applications where you don't need to put sequences of
- packets together. It fits into the system much like TCP. There is a
- UDP header. The network software puts the UDP header on the front of
- your data, just as it would put a TCP header on the front of your
- data. Then UDP sends the data to IP, which adds the IP header,
- putting UDP's protocol number in the protocol field instead of TCP's
- protocol number. However UDP doesn't do as much as TCP does. It
- doesn't split data into multiple packets. It doesn't keep track of
- what it has sent so it can resend if necessary. About all that UDP
- provides is port numbers, so that several programs can use UDP at
- once. UDP port numbers are used just like TCP port numbers. There
- are well-known port numbers for servers that use UDP. Note that the
- UDP header is shorter than a TCP header. It still has source and
- destination port numbers, and a checksum, but that's about it. No
- sequence number, since it is not needed. UDP is used by the protocols
- that handle name lookups (see ien-116.txt, rfc882.txt, and
- rfc883.txt), and a number of similar protocols.
-
- Another alternative protocol is ICMP ("Internet control message
- protocol"). ICMP is used for error messages, and other messages
- intended for the TCP/IP software itself, rather than any particular
- user program. For example, if you attempt to connect to a host, your
- system may get back an ICMP message saying "host unreachable". ICMP
- can also be used to find out some information about the network. See
- rfc792.txt for details of ICMP. ICMP is similar to UDP, in that it
- handles messages that fit in one packet. However it is even simpler
- than UDP. It doesn't even have port numbers in its header. Since all
- ICMP messages are interpreted by the network software itself, no port
- numbers are needed to say where a ICMP message is supposed to go.
-
-
- Routing
- =======
-
- The description above indicated that the IP implementation is
- responsible for getting packets to the destination indicated by the
- destination address, but little was said about how this would be done.
- The task of finding how to get a packet to its destination is referred
- to as "routing". In fact many of the details depend upon the
- particular implementation. However some general things can be said.
-
- First, it is necessary to understand the model on which IP is based.
- IP assumes that a system is attached to some local network. We assume
- that the system can send packets to any other system on its own
- network. (In the case of Ethernet, it simply finds the Ethernet
- address of the destination system, and puts the packet out on the
- Ethernet.) The problem comes when a system is asked to send a packet
- to a system on a different network. This problem is handled by
- gateways. A gateway is a system that connects a network with one or
- more other networks. Gateways are often normal computers that happen
- to have more than one network interface. For example, we have a Unix
- machine that has two different Ethernet interfaces. Thus it is
- connected to networks 128.6.4 and 128.6.3. This machine can act as a
- gateway between those two networks. The software on that machine must
- be set up so that it will forward packets from one network to the
- other. That is, if a machine on network 128.6.4 sends a packet to the
- gateway, and the packet is addressed to a machine on network 128.6.3,
- the gateway will forward the packet to the destination. Major
- communications centers often have gateways that connect a number of
- different networks.
-
- Routing in IP is based entirely upon the network number of the
- destination address. Each computer has a table of network numbers.
- For each network number, a gateway is listed. This is the gateway to
- be used to get to that network. Note that the gateway doesn't have to
- connect directly to the network. It just has to be the best place to
- go to get there. For example at Rutgers, our interface to NSFnet
- is at the John von Neuman Supercomputer Center (JvNC). Our connection
- to JvNC is via a high-speed serial line connected to a gateway whose
- address is 128.6.3.12. Systems on net 128.6.3 will list 128.6.3.12 as
- the gateway for many off-campus networks. However systems on net
- 128.6.4 will list 128.6.4.1 as the gateway to those same off-campus
- networks. 128.6.4.1 is the gateway between networks 128.6.4 and
- 128.6.3, so it is the first step in getting to JvNC.
-
- When a computer wants to send a packet, it first checks to see if the
- destination address is on the system's own local network. If so, the
- packet can be sent directly. Otherwise, the system expects to find an
- entry for the network that the destination address is on. The packet
- is sent to the gateway listed in that entry. This table can get quite
- big. For example, the Internet now includes several hundred
- individual networks. Thus various strategies have been developed to
- reduce the size of the routing table. One strategy is to depend upon
- "default routes". Often, there is only one gateway out of a network.
- This gateway might connect a local Ethernet to a campus-wide backbone
- network. In that case, we don't need to have a separate entry for
- every network in the world. We simply define that gateway as a
- "default". When no specific route is found for a packet, the packet
- is sent to the default gateway. A default gateway can even be used
- when there are several gateways on a network. There are provisions
- for gateways to send a message saying "I'm not the best gateway -- use
- this one instead." (The message is sent via ICMP. See rfc792.txt)
- Most network software is designed to use these messages to add entries
- to their routing tables. Suppose network 128.6.4 has two gateways,
- 128.6.4.59 and 128.6.4.1. 128.6.4.59 leads to several other internal
- Rutgers networks. 128.6.4.1 leads indirectly to the NSFnet. Suppose
- we set 128.6.4.59 as a default gateway, and have no other routing
- table entries. Now what happens when we need to send a packet to MIT?
- MIT is network 18. Since we have no entry for network 18, the packet
- will be sent to the default, 128.6.4.59. As it happens, this gateway
- is the wrong one. So it will forward the packet to 128.6.4.1. But it
- will also send back an error saying in effect: "to get to network 18,
- use 128.6.4.1". Our software will then add an entry to the routing
- table. Any future packets to MIT will then go directly to 128.6.4.1.
-
- Most IP experts recommend that individual computers should not try to
- keep track of the entire network. Instead, they should start with
- default gateways, and let the gateways tell them the routes, as just
- described. However this doesn't say how the gateways should find out
- about the routes. The gateways can't depend upon this strategy. They
- have to have fairly complete routing tables. (It is possible to do
- hierarchical routing, where all of the gateways on a campus know about
- the campus network, but direct all off-campus traffic to a single
- gateway with connections off-campus.) For this, some sort of routing
- protocol is needed. A routing protocol is simply a technique for the
- gateways to find each other, and keep up to date about the best way to
- get to every network. rfc1009.txt contains a review of gateway design
- and routing. However rip.doc is probably a better introduction to the
- subject. It contains some tutorial material, and a detailed
- description of the most commonly-used routing protocol.
-
-
- Details about Internet addresses: subnets and broadcasting
- ==========================================================
-
- As indicated above, Internet addresses are 32-bit numbers, normally
- written as 4 octets (in decimal), e.g. 128.6.4.7. There are actually
- 3 different types of address. The problem is that the address has to
- indicate both the network and the host within the network. It was
- felt that eventually there would be lots of networks. Many of them
- would be small, but probably 24 bits would be needed to represent all
- the IP networks. It was also felt that some very big networks might
- need 24 bits to represent all of their hosts. This would seem to lead
- to 48 bit addresses. But the designers really wanted to use 32 bit
- addresses. So they adopted a kludge. The assumption is that most of
- the networks will be small. So they set up three different ranges of
- address. Addresses beginning with 1 to 126 use only the first octet
- for the network number. The other three octets are available for the
- host number. Thus 24 bits are available for hosts. These numbers are
- used for large networks. But there can only be 126 of these very big
- networks. The Arpanet is one, and there are a few large commercial
- networks. But few normal organizations get one of these "class A"
- addresses. For normal large organizations, "class B" addresses are
- used. Class B addresses use the first two octets for the network
- number. Thus network numbers are 128.1 through 191.254. (We avoid 0
- and 255, for reasons that we see below. We also avoid addresses
- beginning with 127, because that is used by some systems for special
- purposes.) The last two octets are available for host addesses,
- giving 16 bits of host address. This allows for 64516 computers,
- which should be enough for most organizations. (It is possible to get
- more than one class B address, if you run out.) Finally, class C
- addresses use three octets, in the range 192.1.1 to 223.254.254.
- These allow only 254 hosts on each network, but there can be lots of
- these networks. Addresses above 223 are reserved for future use, as
- class D and E (which are currently not defined).
-
- Many large organizations find it convenient to divide their network
- number into "subnet". For example, Rutgers has been assigned a class
- B address, 128.6. We find it convenient to use the third octet of the
- address to indicate which Ethernet a host is on. This division has no
- significance outside of Rutgers. A computer at another institution
- would send any packet whose destination address began with 128.6 on
- the best route to Rutgers. They would not have different routes for
- 128.6.4 or 128.6.5. But inside Rutgers, we treat 128.6.4 and 128.6.5
- as separate networks. In effect, gateways inside Rutgers have
- separate entries for each Rutgers subnet, whereas gateways outside
- Rutgers just have one entry for 128.6. Note that we could do exactly
- the same thing by using a separate class C address for each Ethernet.
- As far as Rutgers is concerned, it would be just as convenient for us
- to have a number of class C addresses. However using class C
- addresses would make things inconvenient for the rest of the world.
- Every institution that wanted to talk to us would have to have a
- separate entry for each one of our networks. If every institution did
- this, there would be far too many networks for any reasonable gateway
- to keep track of. By subdividing a class B network, we hide our
- internal structure from everyone else, and save them trouble. This
- subnet strategy requires special provisions in the network software.
- It is described in rfc950.txt.
-
- 0 and 255 have special meanings. 0 is reserved for machines that
- don't know their address. In certain circumstances it is possible for
- a machine not to know the number of the network it is on, or even its
- own host address. So 0.0.0.23 would be a machine that knew it was
- host number 23, but didn't know on what network.
-
- 255 is used for "broadcast". A broadcast is a message that you want
- every system on the network to see. Broadcasts are used in some
- situations where you don't know who to talk to. For example, suppose
- you need to look up a host name and get its Internet address.
- Sometimes you don't know the address of the system that has the host
- name data base. In that case, you might send the request as a
- broadcast. There are also cases where a number of systems are
- interested in information. It is then less expensive to send a single
- broadcast than to send packets individually to each host that is
- interested in the information. In order to send a broadcast, you use
- an address that is made by using your network address, with all ones
- in the part of the address where the host number goes. For example,
- if you are on network 128.6.4, you would use 128.6.4.255 for
- broadcasts. How this is actually implemented depends upon the medium.
- It is not possible to send broadcasts on the Arpanet, or on point to
- point lines. However it is possible on an Ethernet. If you use an
- Ethernet address with all its bits on (all ones), every machine on the
- Ethernet is supposed to look at that packet.
-
- Although the official broadcast address for network 128.6.4 is now
- 128.6.4.255, there are some other addresses that may be treated as
- broadcasts by certain implementations. For convenience, the standard
- also allows 255.255.255.255 to be used. This refers to all hosts on
- the local network. It is often simpler to use 255.255.255.255 instead
- of finding out the network number for the local network and forming a
- broadcast address such as 128.6.4.255. In addition, certain older
- implementations may use 0 instead of 255 to form the broadcast
- address, e.g. 128.6.4.0. Finally, certain older implementations may
- not understand about subnets. Thus they consider the network number
- to be 128.6. In that case, they will assume a broadcast address of
- 128.6.255.255 or 128.6.0.0. Until support for broadcasts is
- implemented properly, it can be a somewhat dangerous feature to use.
-
- Because 0 and 255 are used for unknown and broadcast addresses, normal
- hosts should never be given addresses containing 0 or 255. Addresses
- should never begin with 0, 127, or any number above 223. Addresses
- violating these rules are sometimes referred to as "Martians", because
- of rumors that the Central University of Mars is using network 225.
-
-
- Packet splitting and reassembly
- ===============================
-
- TCP/IP is designed for use with many different kinds of network.
- Unfortunately, network designers do not agree about how big packets
- can be. Ethernet packets can be 1500 octets long. Arpanet packets
- have a maximum of around 1000 octets. Some very fast networks have
- much larger packet sizes. At first, you might think that IP should
- simply settle on the smallest possible size. Unfortunately, this
- would cause serious performance problems. When transferring large
- files, big packets are far more efficient than small ones. So we want
- to be able to use the largest packet size possible. But we also want
- to be able to handle networks with small limits. There are two
- provisions for this. First, TCP has the ability to "negotiate" about
- packet size. When a TCP connection first opens, both ends can send
- the maximum packet size they can handle. The smaller of these numbers
- is used for the rest of the connection. This allows two
- implementations that can handle big packets to use them, but also lets
- them talk to implementations that can't handle them. However this
- doesn't completely solve the problem. The most serious problem is
- that the two ends don't necessarily know about all of the steps in
- between. For example, when sending data between Rutgers and Berkeley,
- it is likely that both computers will be on Ethernets. Thus they will
- both be prepared to handle 1500-octet packets. However the connection
- will at some point end up going over the Arpanet. It can't handle
- packets of that size. For this reason, there are provisions to split
- packets up into pieces. The IP header contains fields indicating the
- a packet has been split, and enough information to let the pieces be
- put back together. If a gateway connects an Ethernet to the Arpanet,
- it must be prepared to take 1500-octet Ethernet packets and split them
- into pieces that will fit on the Arpanet. Furthermore, every
- implementation of TCP/IP must be prepared to accept pieces and put
- them back together. This is referred to as "reassembly".
-
- TCP/IP implementations differ in the approach they take to deciding on
- packet size. It is fairly common for implementations to use 576-byte
- packets whenever they can't verify that the entire path is able to
- handle larger packets. The problem is that many implementations have
- bugs in the code to reassemble pieces. So many implementors try to
- avoid ever having splits occur. Different implementors take different
- approaches to deciding when it is safe to use large packets. Some use
- them only for the local network. Others will use them for any network
- on the same campus. 576 bytes is a "safe" size, which every
- implementation must support.
-
-
- Ethernet encapsulation: ARP
- ===========================
-
- There was a brief discussion above about what IP packets looked like
- on an Ethernet. The discussion showed the Ethernet header and
- checksum. However it left one hole: It didn't say how to figure out
- what Ethernet address to use when you want to talk to a given Internet
- address. In fact, there is a separate protocol for this, called ARP
- ("address resolution protocol"). Note by the way that ARP is not an
- IP protocol. That is, the ARP packets do not have IP headers.
- Suppose you are on system 128.6.4.194 and you want to connect to
- system 128.6.4.7. Your system will first verify that 128.6.4.7 is on
- the same network, so it can talk directly via Ethernet. Then it will
- look up 128.6.4.7 in its ARP table, to see if it already knows the
- Ethernet address. If so, it will stick on an Ethernet header, and
- send the packet. But suppose this system is not in the ARP table.
- There is no way to send the packet, because you need the Ethernet
- address. So it uses the ARP protocol to send an ARP request.
- Essentially an ARP request says "I need the Ethernet address for
- 128.6.4.7". Every system listens to ARP requests. When a system sees
- an ARP request for itself, it is required to respond. So 128.6.4.7
- will see the request, and will respond with an ARP reply saying in
- effect "128.6.4.7 is 8:0:20:1:56:34". (Recall that Ethernet addresses
- are 48 bits. This is 6 octets. Ethernet addresses are conventionally
- shown in hex, using the punctuation shown.) Your system will save
- this information in its ARP table, so future packets will go directly.
- Most systems treat the ARP table as a cache, and clear entries in it
- if they have not been used in a certain period of time.
-
- Note by the way that ARP requests must be sent as "broadcasts". There
- is no way that an ARP request can be sent to the right system. After
- all, the whole reason for sending an ARP request is that you don't
- know the Ethernet address. So an Ethernet address of all ones is
- used, i.e. ff:ff:ff:ff:ff:ff. By convention, every machine on the
- Ethernet is required to pay attention to packets with this as an
- address. So every system sees every ARP requests. They all look to
- see whether the request is for their own address. If so, they
- respond. If not, they could just ignore it. (Some hosts will use ARP
- requests to update their knowledge about other hosts on the network,
- even if the request isn't for them.) Note that packets whose IP
- address indicates broadcast (e.g. 255.255.255.255 or 128.6.4.255) are
- also sent with an Ethernet address that is all ones.
-
-
- Getting more information
- ========================
-
- This directory contains documents describing the major protocols.
- There are literally hundreds of documents, so we have chosen the ones
- that seem most important. Internet standards are called RFC's. RFC
- stands for Request for Comment. A proposed standard is initially
- issued as a proposal, and given an RFC number. When it is finally
- accepted, it is added to Official Internet Protocols, but it is still
- referred to by the RFC number. We have also included two IEN's.
- (IEN's are an older form of RFC.) The convention is that whenever an
- RFC is revised, the revised version gets a new number. This is fine
- for most purposes, but it causes problems with two documents: Assigned
- Numbers and Official Internet Protocols. These documents are being
- revised all the time, so the RFC number keeps changing. You will have
- to look in rfc-index.txt to find the number of the latest edition.
- Anyone who is seriously interested in TCP/IP should read the RFC
- describing IP (791). RFC 1009 is also useful. It is a specification
- for gateways to be used by NSFnet. As such, it contains an overview
- of a lot of the TCP/IP technology. You should probably also read the
- description of at least one of the application protocols, just to get
- a feel for the way things work. Mail is probably a good one
- (821/822). TCP (793) is of course a very basic specification.
- However the spec is fairly complex, so you should only read this when
- you have the time and patience to think about it carefully.
- Fortunately, the author of the major RFC's (Jon Postel) is a very good
- writer. The TCP RFC is far easier to read than you would expect,
- given the complexity of what it is describing. You can look at the
- other RFC's as you become curious about their subject matter.
-
- Here is a list of the documents you are more likely to want:
-
- rfc-index - list of all RFC's
- rfc1012 - somewhat fuller list of all RFC's
- rfc1011 - Official Protocols. It's useful to scan this to
- see what tasks protocols have been built for. This defines
- which RFC's are actual standards, as opposed to requests
- for comments.
- rfc1010 - Assigned Numbers. If you are working with TCP/IP,
- you will probably want a hardcopy of this as a reference.
- It's not very exciting to read. It lists all the offically
- defined well-known ports and lots of other things.
- rfc1009 - NSFnet gateway specifications. A good overview of
- IP routing and gateway technology.
- rfc973 - update on domains
- rfc959 - FTP (file transfer)
- rfc950 - subnets
- rfc894 - how IP is to be put on Ethernet, see also rfc825
- rfc882/3 - domains (the database used to go from host names to
- Internet address and back -- also used to handle UUCP
- these days). See also rfc973
- rfc854/5 - telnet - protocol for remote logins
- rfc826 - ARP - protocol for finding out Ethernet addresses
- rfc821/2 - mail
- rfc814 - names and ports - general concepts behind well-known ports
- rfc793 - TCP
- rfc792 - ICMP
- rfc791 - IP
- rfc768 - UDP
- rip.doc - details of the most commonly-used routing protocol
- ien-116 - old name server (still needed by several kinds of system)
- ien-48 - the Catenet model, general description of the philosophy
- behind TCP/IP
-
- The following documents are somewhat more specialized.
-
- rfc813 - window and acknowledgement strategies in TCP
- rfc815 - packet reassembly techniques
- rfc816 - fault isolation and resolution techniques
- rfc817 - modularity and efficiency in implementation
- rfc879 - the maximum segment size option in TCP
- rfc896 - congestion control
- rfc827,888,904,975,985 - EGP
-
- To those of you who may be reading this document remotely instead of
- at Rutgers: The most important RFC's have been collected into a
- three-volume set, the DDN Protocol Handbook. It is available from the
- DDN Network Information Center, SRI International, 333 Ravenswood
- Avenue, Menlo Park, California 94025 (telephone: 800-235-3155).
- You should be able to get them via anonymous FTP from sri-nic.arpa.
- File names are:
- RFC's:
- rfc:rfc-index.txt
- rfc:rfcxxx.txt
- IEN's:
- ien:ien-index.txt
- ien:ien-xxx.txt
- rip.doc is available by anonymous FTP from topaz.rutgers.edu, as
- /pub/tcp-ip-docs/rip.doc.
-
- Sites with access to UUCP but not FTP may be able to retreive them
- via UUCP from UUCP host rutgers. The file names would be
- RFC's:
- /topaz/pub/pub/tcp-ip-docs/rfc-index.txt
- /topaz/pub/pub/tcp-ip-docs/rfcxxx.txt
- IEN's:
- /topaz/pub/pub/tcp-ip-docs/ien-index.txt
- /topaz/pub/pub/tcp-ip-docs/ien-xxx.txt
- /topaz/pub/pub/tcp-ip-docs/rip.doc
- Note that SRI-NIC has the entire set of RFC's and IEN's, but rutgers
- and topaz have only those specifically mentioned above.
-